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Abstract 

We consider Holder smoothness classes of surfaces for which we construct piecewise 
polynomial approximation networks, which are graphs with polynomial pieces as nodes 
and edges between polynomial pieces that are in 'good continuation' of each other. 
Little known to the community, a similar construction was used by Kolmogorov and 
Tikhomirov in their proof of their celebrated entropy results for Holder classes. 

We show how to use such networks in the context of detecting geometric objects 
buried in noise to approximate the scan statistic, yielding an optimization problem akin 
to the Traveling Salesman. In the same context, we describe an alternative approach 
based on computing the longest path in the network after appropriate thresholding. 

For the special case of curves, we also formalize the notion of 'good continuation' 
between beamlets in any dimension, obtaining more economical piecewise linear ap- 
proximation networks for curves. 

We include some numerical experiments illustrating the use of the beamlet net- 
work in characterizing the filamentarity content of 3D datasets, and show that even a 
rudimentary notion of good continuity may bring substantial improvement. 
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1 Introduction 



1.1 From function approximation to set approximation 

An important trend in Approximation Theory and Harmonic Analysis focuses on designing 
dictionaries (e.g. orthonormal bases) {4>n} well-adapted to a given function class J-', in the 
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sense that any / G is well-approximated by a linear combination of a few functions from 
the dictionary: 

nGAf(/) 



Examples of dictionaries include the Fourier basis, polynomials, splines 
functions, wavelets |44| and others such as wedgelets |2J], platelets 



rigelets [13|, curvelets [16|, chirp lets [14 
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531], radial basis 
bandelets [39], 



In the context of function estimation in additive white noise |32l . |2J, |2l| , approximations 
by sums of atoms as in ([T]) are particularly suitable. Consider instead a setting where 
a geometric object (i.e. a set) is buried in noise or clutter, a setting considered e.g. in 



23l . |6j. This is for example relevant in target tracking, where the object of interest is the 



target's trajectory often modeled as a curve. In this setting, set approximation plays the 
role of function approximation, and the aim of the present paper is to develop strategies to 
compute approximations of sets by unions of simple building blocks akin to how beamlets 
are used to approximate curves in [23i] . 

Since a set may be equivalently represented by its indicator function, approximation of 
sets may appear to be a special case of approximation of functions. Though indeed closely 
related, function approximation does not directly translate into set approximation. In part, 
this comes from the fact that the image of a sum of functions is in general not the union of the 
functions' images. However, when a parametrization of the set is available, approximating 
the parametrization in piecewise fashion (i.e. when the supports of the functions involved 
in the sum do not overlap) does result in a proper approximation of the set itself. This is 



38, 22 in the 



for example the case when the set is the graph of a function, considered in 
context of image processing. 

The selection of building blocks in set approximation is also not necessarily parallel to 
that of atoms in function approximation. The main difference is in the fact that overlapping 
building blocks create redundancy, while atoms with overlapping supports may cancel each 
other in appropriate ways to fit the target function, e.g. in trigonometric, polynomial or 
wavelet expansions. Typically, the approximation ([1]) is constructed by first computing the 
coefficients a„, often as the inner product of / and 0„, and then keeping the largest ones 



in absolute value, which is in effect equivalent to thresholding [2]|. A similar strategy may 
be implemented for sets, where building blocks with the most overlap with the set (as a 
fraction of their size) are selected; this corresponds to the simplest beamlet-based algorithm 



presented in j23|]. However, the result is often redundant, as nearby building blocks tend 



to have a similar overlap with a given set. This is avoided in the beamlet-based coder for 



lap 

curves introduced in 30|, by choosing at most one beamlet per dyadic square in a a given 
recursive dyadic partitioning. The strategy we adopt here consists of looking for a union 
of building blocks that obey some sort of 'good continuation'. In the approximation of 
curves, this corresponds to the most complex algorithm presented in 23|] which amounts to 
chaining beamlets together that share one endpoint and have similar orientations. This is 
formalized in [tI for curves that are graphs of Holder functions and in 15] for chirps (highly 
oscillating functions), chaining chirplets in good continuation. Good-continuation principles 



are also considered in [ISj, [IQj inspired by the Gestalt theory of vision, with applications to 



the detection of (parametric) geometric object in images. 
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1.2 Networks of polynomial pieces 

We define a network of polynomial pieces to be a grapli witli nodes indexing polynomial pieces 
and edges between polynomial pieces in good-continuation. (We use the word 'network' 
instead of 'graph' so as to avoid confusion with the notion of 'graph of a function'.) We 
construct such networks to approximate surfaces of varying dimension and smoothness. Just 
as beamlets are line-segments spanning a wide range of location, orientations and scale, so 
do the polynomial pieces. The networks are akin to that of 0] in that the notion of good- 
continuation is explicit and the scales are not mixed together, thus better adapted to surfaces 
with homogeneous smoothness, e.g. graphs of Holder functions. We do, however, suggest 
ways to go multiscale. 

In the 1950's, Kolmogorov and Tikhomirov used piecewise polynomial approximations 
together with a similar kind of good-continuation notion to bound the e-entropy of Holder 
function classes [35]. Our construction may be seen as a formalization of their approach, 
with emphasis on the approximation of sets instead of functions. Note that this ancestry 
was discovered after the fact; the present perspective was indeed independently suggested in 
jil with the intention of generalizing the system used in [3]. 

1.3 The special case of beamlets 



Beamlets were introduced in [23|] for the explicit purpose of approximating curves in 2D, with 



a 3D version latter developed in [25||. A variety of algorithms are proposed in |23j, where 
the more elaborate ones are based on the chaining of beamlets in good continuation. How- 
ever, the notion of good-continuation remains implicit and only palpable through numerical 
experiments. 

We formalize here this notion of good-continuation for beamlets. This was previously 
done in [3] for a beamlet-like system built to detect graphs of Holder functions. In the 
resulting beamlet network, small beamlets have a large number of neighbors; this seems 
unavoidable if the network is to enable accurate approximation of curves. To address this 
issue, we develop an alternative network of line-segments, with emphasis on developing an 
economical system both in terms of number of nodes and connectivity. 

1.4 Application to the analysis of point clouds and images 
1.4.1 Detection of geometric objects 

Consider a simple model for detection, where we observe a point cloud (i.e. a spatial point 
process) and the goal is to decide whether the points were generated uniformly at random 
or a fraction of them were sampled from a geometric object (i.e. a surface) belonging to a 
given class. 



This is a standard setting where the scan statistic is used [28|, l27j, which consists in com- 
puting the largest number of points belonging to one of the objects of interest. Though the 
scan statistic achieves the best known detection rates, both for parametric [gI and nonpara- 
metric classes of objects, it is not computationally friendly as it involves an optimization 
over a large (function) class of objects, though there are exceptions [3l|. Instead, we propose 
to replace the optimization over the class of objects with an optimization over paths in an 
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approximating network. The idea of replacing an optimization over a function space with an 
optimization over a carefully constructed graph is quite natural, and in fact appears in other 



situations, e.g. in the computation of minimal surfaces 12|, |37|, l3J, l29|. In our particular 
case, the resulting optimization problem is akin to the Budget-Reward problem 17|; though 
still NP-hard, this problem admits polynomial time approximations. Moreover, we show 
that this approximation achieves the best known detection rates. 

Note that the optimization above is computationally more tractable (e.g. using dynamic 
programming ideas) when the network is direct and acyclic, which is the case for example 



41 , where a beamlet network is used to track the 



in multiframe target tracking; see e.g. 
time-space primitives. 

We also consider an alternative approach based on computing the size of the longest path 
in the network after thresholding, which is suggested in [23|. Dynamic programming ideas 
may also be implemented here, as done in [3]. We show that this method also achieves the 
best known detection rates. 

We mention that the same approaches may be implemented in the setting of an image, 
where a geometric object is buried in white Gaussian noise. 



1.4.2 Characterization of spatial distributions 

In Astrophysics, the study of the galaxy distribution involves quantifying the content in 
filaments, sheets and blobs in 3D galaxy catalogs 45, Ultimately, scientists would like to 



know which of the many cosmological models is best (and well) supported by observations. 
Practically, the task is to meaningfully compare simulated galaxy distributions from various 
cosmological models with the observed galaxy distribution. 

With the presence of highly anisotropic features such as filaments, traditional tools for 
analyzing spatial data become irrelevant, among them the classical two-point correlation 
function. Instead, a method based on beamlets is very attractive, as beamlets provide good 



approximations for filaments. And indeed, 20|] presents evidence that beamlets are useful 



at separating various cosmological models, even though the algorithm implemented in [2C 
is of the simplest kind and in particular does not involve chaining (i.e. good-continuation). 
We perform a number of numerical experiments on simulated data that show that chaining 
may bring substantial improvement. 

Note that such tools are in demand in other scientific fields, such as Medical Imaging, 
for example in the examination of vascular networks S^l or cancer cells |43i] . 



1.5 Contents 

The contents are organized as follow. In Section 2, we introduce piecewise polynomial net- 
works designed to approximate surfaces of any intrinsic dimension and (Holder) smoothness. 
In Section 3, we consider the detection of geometric objects buried in noise and develop 
methods based on these networks. In Section 4, we formalize a notion of good-continuation 
for beamlets in arbitrary dimension and show that the resulting network has desirable ap- 
proximation properties for curves. In Section 5, we perform some numerical experiments 
showing that the notion of good-continuation may bring practical improvement. Some of 
the proofs and technical arguments are gathered in the Appendix. 
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2 Networks of Polynomial Pieces 



We build an explicit, algoritlimically friendly approximating network for Holder surfaces, 
i.e. graphs or images of Holder functions. As a Holder function is well-approximated locally 
by a polynomial, in fact its Taylor expansion, it is natural to construct piecewise polyno- 
mial approximations. The idea is to partition the unit hypercube into smaller hypercubes 
and in each smaller hypercube provide a choice of approximation by polynomials; since the 
functions we consider are uniformly smooth, approximations in nearby hypercubes should 
be close, which we formalize as a neighboring condition. We thus form a network with nodes 
indexing polynomial pieces and edges linking those in good-continuation, restricting the pos- 
sible combinations to those useful in approximating functions of given smoothness, in such a 
way that functions and certain connected components in this network are in correspondence. 
Though we build a different network for each smoothness class, it is possible to discretize the 
range of parameters resulting in a dyadic organization of this family of networks by scale. 
Little known to the community, a similar construction was used by Kolmogorov and 



Tikhomirov in their seminal work on e-entropy of Holder function classes (and others) [36|, |35 
Note that the present construction was independently suggested in jil, as a generalization 
of the beamlet-like system used in [Tj. 

We first introduce some notation. For i G {l,...,k}, let ej denote the ith canonical 
vector in R'^. For a vector x = (xi, . . . , x^) G M'^, its supnorm is defined as ||x|| = max{|xj| : 
i = 1, . . . , k}. For s = (si, . . . , s^) G N^, let s! = Si! ■ ■ ■ s^! and |s| = Si + ■ ■ ■ + Sk- For a 
function / and s = (si, . . . , s^) G N'', f^^^ = d^\ - ■ ■ d^lf- 

We define the following constants: 

|s| = H • |s|<LqJ 

Note that ci < exp(A;) and C2 < exp(/c/2). 



2.1 Holder smoothness classes 

For a,/3 > 0, define TC''{a,l3) as the Holder smoothness class of [aj-times differentiable 
functions functions / : [0, l]'^ — [0, 1] satisfying: 

|/(^)(x)| </5, VxG [0,lf, VsgNMsI < [aj; (3) 
|/(^)(y)-/(^)(x)| </5||y-xr-L-J, Vx,yG[0,lf, Vs G NMs| = [aj. (4) 

Example, (/c = 1, a G (1,2]) 

\f\x)\<P, VxG[0,l]; 
\f{y)~f{x)\<P\y-xr\ \/x,ye[0,l]. 

Functions in 1-L''{a,(3) are uniformly well-approximated locally by polynomials, specifi- 
cally their Taylor expansions. For / G T-C^{a,l3) and x G [0, 1]'^, the Taylor expansion of / 
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at X of degree [a\ is defined as follows: 

/x(y)= E /(^Hx) n^^^^. 

Example, {k = 1, a e (1,2]) 

fx{y) = f{x) + f'{x){y - x). 
Lemma 2.1. For any f E l-i^{a,(3), 

|/(y)-/x(y)| <ci/3||y-xr, Vx,y G [0,1]^ 
Proof. A Taylor approximation of degree \_a\ gives: 

ivi - XiY 



/(y) = /x(y)+ E if^^\z)-f(^\^)) H ^ 

III -1 ^i' 

s| = [q:J 1=1 



for some z on the segment joining x and y. Hence, 

|/(y) - /x(y)| < cilly - x||L-J max |/(^)(z) - /(^)(x)|. 

|s| = [qJ 

Now apply (jl]) and the fact that ||z — x|| < ||y — x|| to get 

- f^'\^)\ <f3\\y- xf-L^J, vs G N^ |s| = [a\. 



□ 



2.2 Nets of piecewise polynomials 

We now build a family of nets for H^{a,P) by dividing [0,1]'^ into hypercubes and then 
offering a choice of approximation by polynomials within each hypercube, which is most 
relevant in view of Lemma 12.11 Fix A G (0, 1) and 6 > 0, and define Sg = A~^6, s = 
0, . . . , \_a\. In the construction that follows, the parameter A quantizes the variable space, 
while each parameter Sg quantizes the range of values of derivatives of order s of functions in 
T-C^{a,f3). Note that the quantization is coarser for higher order derivatives, and specifically 
chosen so that the approximation result in Lemma [2.21 below holds. 

Divide [0, 1]*^ into hypercubes indexed by m G {1, . . . , A"^}'^ of the form: 

k 

Im = Yl[{m,-l)A,miA]. 

i=l 

Let Xm = {xm,i, ■ ■ ■ ,Xm,k) dcuotc the center of Jm, i.e. x^i = {rrii — 1/2)A. On each 
hypercube /m, consider polynomials of the form 

P„.,h(x)= Yl h^'^^l^l n^^i^^, (5) 
|s|<N *=i 

where h = {h^^^ : |s| < [aj), with G Z and \h^^^S\s\ \ < p. 
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(m-l)A 



X 



m 



mA 



(a) Linear 



(b) Higher order 



Figure 1: Examples of polynomial pieces for different choices of A and 6, together with their 
associated region as defined in 

Example, {k = 1, a e (1, 2]) For h = h^^^), 

Pm,h{x) = h^^^6 + h^^^6i{x - Xm). 
For / G H''{a,f3), let h'^^^m, f) denote the closest integer to /'■^■'(xm)/(5|s|. 
Lemma 2.2. For f G n''{a, (3) and m G {1, . . . , , 

|/(x) -p„.,h{m,/)(x)| < (ci/2")/3A" + (c2/2)5, Vx G 
Proof. The proof of Lemma 12.21 is in the Appendix. □ 
Partly justified by Lemma [2.21 we now assume that 



The system {pm,h} is therefore rich enough to provide a certain degree of approximation 
locally. The same degree of approximation may be achieved globally by simply considering 
functions that coincide with the polynomials above within each hypercube, namely 



where this time h also depends on m, representing a different choice for each hypercube. 

As Kolmogorov and Tikhomirov realized, generating a g\i by simply picking a polynomial 
in each hypercube independently would result in a wasteful system, for in fact polynomials 
in neighboring hypercubes may be restricted to have similar coefficients. This comes from 
the fact that the derivatives up to order [a\ of a function in / G T-l^{a,l3) have a certain 
degree of smoothness. 

Lemma 2.3. For f G T-C''{a,(3) and s such that |s| < [aj, 



6 = ci/3A°. 



(6) 




m 




*<L"J-|s| 



for all X G [0, 1]^, i 



1, . . . ,k and rj such that x + rjei G [0, 1]'^. 
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Proof. As with Lemma [2TTI perform a Taylor approximation of degree \_a\ — |s| along ej and 
apply (jl]). In fact, /? may be replaced by /9/([«J — |s|)!. □ 



2.3 Approximating networks for Holder graphs 

For a function / : [0, l]'^ — > [0, 1], define its graph as 

graph(/) = {(x,/(x)):xG[0,l]'=}. 

Assuming (Q, we define a network of polynomial pieces G^{a,P) with the property that a 
certain kind of connected components index piecewise polynomial approximations for graphs 
of Holder functions. The network f3) has nodes of the form (m, h) indexing the polyno- 

mials Pm,h defined in ([5]). Two nodes in the network (m, h) and (m^, h^,) are neighbors if the 
corresponding hypercubes, /m and 1^^, are adjacent, and if the corresponding polynomials, 
Pm,h and Pm*,hv,, and their derivatives assume nearby values both at x^ and Xm^. Formally, 
this corresponds to = m + ^e^ for some i G {1, . . . ,k} and C, G { — 1, +1}, and for all 
s G N^ |s| < [aj. 



*<N-|s| 



< 3 and 



t<H-|s| 



< 3. 



(7) 



This last property is a discrete version of Lemma 12.31 



Example. (A; = 1, a G (1,2]) The nodes (m, /i^^)) and (m^, /iF, /li^^) are neighbors if 
— ml = 1 and 



- < 3, - - (m, - m)/i(i)| < 3, - h^^^ + (m, - m)/i«| < 3. 



(a) In good continuation (b) Not in good continuation 

Figure 2: Examples of polynomial pieces in good continuation (left) and not in good contin- 
uation (right). 

The family of networks G^(a, P) effectively generalizes the system presented in [3] for the 
case = 1, a G (1, 2]. 
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Define constants 

^ / s + k-1 \ ^ f s + k-1 \ 

s=Q ^ ^ s=0 ^ ^ 

Lemma 2.4. /ias ©(/J^^^-^^A^^-fc) = c»(/5^3-("4-'=)/"5-"3+(c4-fc)/a) ^^(ies an(i eac/i 

node has at most 2k6'^^ neighbors. 

Proof. The proof of Lemma 12.41 is in the Appendix. □ 

Example. (A; = 1, a G (1,2]) has 0{P6~^) nodes, each with degree at most 72. 

To each vertex (m, h) we associate a region around the graph of Pm,h restricted to Jm, 
of thickness given by the error bound of Lemma 12.21 

R{in,h) = {(x,z) el^x [0,1] : ^ - j)^,h(x)| < Co<5}, (8) 

where Cq = (2~" + C2/2). See Figure [H 

Example. (A; = 1, a G (1,2]) The regions are in this case parallelograms. 
For a subset of nodes vr, define R{7i) = IJ(m h)e7r h). 

Let Ilg{a,P) denote the set of connected components of G|(a,/3), homeomorphic to the 
square grid {1, . . . , A~^}'^. 

Theorem 2.5. For each f G TC^{a,P), there is a connected component n G Il'^{a,P) such 
that graph(/) C R{tc). 

Proof. The proof of Theorem 12.51 is in the Appendix. □ 



Figure 3: Example of covering of an Holder graph (blue) with polynomial (linear) pieces in 
good continuation (red). 

Theorem 12.51 implies that the system of functions 

l{xe/m} Pm,h, 7rGn5(a,/3), 

(m,h)G7r 

is an £-net for T-C''{a,(3) with e = cqS. From Lemma [2.41 and Theorem 12.51 we see that the 
system has entropy of order 0(e~'^/"), which is essentially the smallest possible [3^ . 
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2.4 Approximating networks for Holder immersions 

Define H^''^{a,l3) as the class of functions / : [0, 1]'' — > [0, l]'' with coordinates in H^{a,f3), 
i.e. / = (/i, . . . , /rf) with fr e H\a, /3) for r = 1, . . . , 
For a function / : [0, l]'^ — * [0, 1]°', define its image as 

im(/) = {/(x):xG[0,lf}. 

An approximation network for images of Holder functions is simply built out of a tensor 
product of copies of the network built in the previous section. Specifically, define the net- 
work G^''^(a, with nodes of the form (m, hi, ... , h^), indexing the multivariate polynomial 
(Pm,hi, • • • ,Pm,hd)j cind cdgcs bctwecn nodes indexing polynomial pieces on adjacent hyper- 
cubes and satisfying ([71) coordinate- wise. 

Example, {k = 1, a G (1,2]) The polynomial pieces are of the form: 

{hfh + h^^^5i{x - hfd + h^^hi{x - x^)), \x-x^\< A/2. 
Lemma 2.6. G^''^(a,/3) has 0(/5'^^35-<ic3^dc4-fc) ^ Q^i^dc3-{dc^-k)/a^~dc3+{dcA~k)/^'^ nodes and 
each node has at most 2kQ'^^ neighbors. 

Proof. The proof follows that of Lemma 12. 4[ Details are omitted. □ 

Example, (fc = 1, a G (1,2]) G>Y{a,P) has 0(/32-i/"^-4+i/a) nodes, each with degree at 
most 2592. 

As before, to each vertex (m, hi, . . . , h^) we associate a region around the image of 
(Pm,hi, • • • ,Pm,hd) restricted to Jm, of thickness given by the error bound of Lemma [2^ 

i?(m, hi, . . . , hrf) = {z G [0, 1]"^ : 3x G [0, l]^ Vr = 1, . . . , c?, - p^,h. (x) | < Co5}. 

Let n^''^(a;,/3) denote the set of connected components of G^''^(a,/?), homeomorphic to 
the square grid {1, . . . , A~^}'^. 

Theorem 2.7. For each f G 7^'^''^(a, /5), there is a connected component n G Il'^''^{a,P) such 
that im(/) C i?(7r). 

Proof. The proof follows that of Theorem 12. 5[ Details are omitted. □ 

2.5 Networks organized by scale 

In practice, the parameters a and (3 are often unknown, so that it becomes necessary and/or 
useful to look through a discrete set. We introduce a scale parameter and another parameter 
indexing the approximation order, and organize the graphs accordingly. Fix J G N as the 
maximum scale and a sequence aj ^ oo as J —>■ oo. At scale j G {0, . . . , J} and approxi- 
mation order i G N, let G^_j('-) (resp. G^j('-)) be defined as Gg{a,l3) (resp. Gf'^{a,l3)) with 

[a] =L,A = 2'^, 6 = 2^-^ and \h^^'>S\^\ \ < a^f^\ 

We have the following corollary of Theorems 12.51 and 12. 7[ 

Proposition 2.8. For a,l3 > 0, let j = j{a,l3) = |" '^+^°^^^^^) "] , Assume J is large enough 
that j{a,f3) < J and aj > (3. Also, take l = [_a\ . Then the covering result of Theorem \2.5\ 
(resp. Theorem \2. 7p applies with (resp. G^'j{l)). 

Proof. This is a simple corollary of Theorems 12.51 and 12. 7[ □ 
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2.5.1 A single multiscale network 



The networks G^j(i), j = 1, . . . , J, l G N, constitute a family of monoscale networks. 
Instead, one may want to mix scales together (and possibly mix approximation orders too, 
though not done here) so as to better approximate functions with varying smoothness, akin to 



how Besov functions are decomposed into a sum of wavelets at various scales [46|. Specifically, 
consider J-'^{l) to be the class of t-times continuously differentiable functions of the form 
/ = J2p£P fpXp^ where P is a finite partition of [0, 1]*^ into regions with boundaries of finite 
length and fp G H''{ap,Pp) with ap G {L,t + 1] and Pp > 0. For such a function class, we 
may want to involve a number of scales, each adapted to a different smoothness degree. 
In particular, such a multiscale approximating network may be built out of the union of 
j — 1) • • • ) with additional edges between nodes at different scales. The neigh- 
boring condition across scales may be chosen to be identical to that defined in Section 12.31 
namely that (m, h) G Gjj and (m^, h^) G j are neighbors if Jm and Jm* are adjacent, 
and if Pm,h and Pm^^.h^, together with their derivatives assume nearby values both at and 
Xm*- (The condition translates into a precise statement involving h and akin to ([7]), yet 
more cumbersome. We omit details.) Let Gj(t) denote this multiscale network. 

Given a function / G f = "^p^p fpXpy we use a recursive dyadic partitioning 



(RDP), the cornerstone of many multiscale algorithms |44j . |23| . |22| . |39|, to approximate the 
partition P. Specifically, we start at j = and then recursively subdivide each dyadic 
hypercube S at scale j until j > j{ap,Pp) for all p G P with |p fl 51 > 0. We then use a 
polynomial piece from Gj within each RDP cell at scale j. See Figure HI 

For a subset of nodes vr in define P(vr) = IJ(m h)e7r -^('^' ■'^)' where each region is 

defined with the appropriate scale. 

Proposition 2.9. For each function f G there is a connected component vr within 

Gj(i) in correspondence with an RDP such that graph(/) C R{ti). 

Proof. This is essentially a corollary of Proposition 12.81 Details are omitted. □ 

Note however that the typical degree of a node in Gj(i) increases with J, as a result of 
connecting nodes across scales. 



3 Detection of Objects in Point Clouds and Images 

We consider a simple model of detection of objects in point clouds and design some algorithms 
based on the networks introduced in Section [2l The objects are assumed to be of Holder 
type. Fix an Holder class H'^''^~'^{a, f3). For / G H'^''^~'^{a, /3) and rj >0, define 

graph,(/) = {(x, z) G [0, l]'^ : ||z - /(x)|| < r^}, 

which is a region centered around the graph of / and of thickness 2ri. 

Suppose we observe a point cloud Xi, . . . ,X„ G [0, l]'^, and want to decide between the 
following two hypotheses (generative models): 

Ho: Xi, . . . , X„ Uniform[0, l]'^; 

Hi : Xi, . . . , X„ (1 - £„)Uniform[0, 1]^ + £„Uniform(graph^(/*)), 

for some (unknown) /* G 7i^''^~'^(a, /3). 
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Figure 4: A partition into two regions pi (white) and p2 (grey) associated with some function 
/ = fpiXpi + fp2Xp2, with corresponding scales j(api,/3pi) = 1 and j{ap2,(3p^) = 3. 



The same situation was considered in 0, 0, Isl . 

For a (measurable) set S C [0, l]'^, let N{S) denote the number of data points belonging 
to S, which under the null has the binomial distribution with parameters n and 15*1^, the 
(i-dimensional Lebesgue measure of S: 

N{S) = #{z : X, G 5} Bin(n, |5|,). 



3.1 Generalized Likelihood Ratio Test 

If /* were known, the most powerful test would be the likelihood ratio (i.e. Neyman- Pearson) 
test, which rejects for large values of A^(graph^(/*)). The scan statistic is the maximum over 
those statistics: 

M^'\a,(3)= max iV(graph^(/)). 

The generalized likelihood ratio test (GLRT) rejects when M^''^{a,l3) is large. 
Define 

^^^'^'"^= fc + a(^rf-A:) - 
Theorem 3.1. There are constants A,B>0 not depending on n such that 

Proof. The lower bound is obtained by interpolation of carefully selected points, while the 
upper bound is obtained using a precise-enough net for TC''''^~^{a, (3) of near-optimal entropy 
(such as introduced in Section[2]). We refer the reader to Section 2.3 in [sl for more details. □ 

Since M^''^{a,P) > A^(graph^(/*)) and graph^(/*) contains at least an e„ proportion of 
the point cloud (roughly), the GLRT asymptotically separates Hq and Hi if, for some fixed 
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B' > B, En > B'{nf V ri'^~^n), meaning that both the probabihties of false alarm (type I 
error) and missed detection (type II error) tend to zero as n increases. 

The GLRT as defined above is challenging, if not impossible to compute exactly. We 
use instead an approximation based on the coverings constructed in Section [2], turning an 
optimization over a functional space into an optimization over paths in a network, for which 



a large number of algorithms have been developed. See also |34j . |12[ |. where variational 
problems related to computing minimal surfaces are turned into combinatorial optimizations 
over paths and other structures within networks. 

Recall the notation used in Section 2, and assume again that A and S are related according 
to (E]). We use the network G^''^~*^(a, j3) to build appropriate coverings, this time with slightly 
enlarged regions: 

/?5(m,hi, . . . ,hd_fc) = {(x,z) : x G /m,Vr = l,...,d-k, \zr -Pm,h.(x)| < (cq + 1)5}, 

where cq = (2"" + C2/2) as in ([8]). 

Fix a path {mzz(t) : t = 1, . . . , A~'^} in the square grid {1, . . . , A~^}*^ covering the whole 
grid in zig-zag fashion, and define Vg''^~'^{a, (3) as the set of paths in G^''^~'^{a, jd) of the form 

{(m,,(t), hi(t), . . . , hd-fc(t)) : t = 1, . . . , A-'^}. 

Also, recall the definition of n^''^~'^(a, (3) in Section[2l Now consider the following alternative 
statistics: 



Mn'^(a,/5)= max N{Rs{'k)), M!^''l{a, (3) = max N{Rs{P)). 
In that case, 



M^''^(a,/?) < M^'j(a,/?) < M^'j(a,/5). 

The first inequality comes from Theorem 12.51 and the triangle inequality; the second from 
the fact that Uf''~\a,P) C Vf {a,l3). Actually, by Theorem [321 below, with a proper 
choice for 5 all three are of same order of magnitude with high probability. 

Theorem 3.2. With 5 = n^^/C'+o'id-k)) y there is a constant C = C{k, d, a, /3) such that 
P |M^'j(a,/3) < C{nP V r]'^-''n)\Ho^ ^1, n ^ 00. 

Proof. The proof of Theorem 13.21 is in the Appendix. □ 
Computing Mp^(a,/5) may be done efficiently using dynamic programming ideas, for 



example as implemented in [15|]; this is due to the fact that the paths in V^''^ ^{a,(3) are 



oriented and with no loops. Though M^'^(q;,/9) provides a better approximation to the scan 
statistic, we do not know of an efficient way to compute it directly. 

The results above hold for Holder immersions as well, though in that case the compu- 
tations are much more challenging. This comes from the fact that Holder immersions may 
self- intersect, so that dynamic programming approaches do not apply. In fact, if the optimiza- 
tion is over all paths of length A"'^ instead, the setting is equivalent to the Budget-Reward 



Problem 17| (also called Bank Robber Problem), closely related to the Prize-Collecting 
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Traveling Salesman problem though those problems are known to be NP-hard, there 
are polynomial-time approximations 17|. Other approaches have been suggested in this 
situation, for example in 23|], where ratios of additive criteria are used to recover chains of 



beamlets; or algorithms implemented to extract curves from saliency networks 5l|, |42 



3.2 Longest Significant Run 

We propose an alternative approach based on the size of the longest path after discarding 
nodes with low counts within their associated region, which in effect generalizes the algorithm 
introduced in 

For a threshold r > 0, define 



1, N{Rs{in,hu 
0, otherwise. 



5'(m,hi, . . .,hd-k) = 
Then define L'^''^{a,f3) as the length of the longest path of the form 



{(m,,(t), hi(t), . . . , hd_fe(t)) ■.t = to,...,to + i-l}, 

such that S'(mzz(t), hi(t), . . . , h^_fc(t)) = 1 for all t = to; • • • ; + ^ ~ 1- If we see each 
S{m, hi, ... , hd-k) as a test at node (m, hi, ... , hd-k) and say that it is significant if it 
equals 1, then L^''^{a,P) is the length of the longest significant run (LSR). 

Theorem 3.3. With S = v i], for t large enough, 

P{L'/ia,f3)<\ogil/5)\Ho}^l, 



n 



oo. 



Also, there is a constant C = C{k, d, a, (3) such that, if s > Cnf V rj n, 

P {0\a,P) > \og{l/6)\Hi} ^1, n oo. 
Proof. The proof of Theorem 13.31 is in the Appendix. 



□ 

Therefore, the LSRT achieves the detection rate established for the GLRT (Theorem 
13. ip and its approximation (Theorem 13.21) . Moreover, it can be computed using dynamic 
programming ideas since again the paths considered are oriented and without loops. This is 
done in 

The same approach applies essentially unchanged to the case of Holder immersions, 
though with pathological cases obscurin g th e exposition, so that we omit details. 

A similar approach is advocated in 481], where a network of streams is monitored for 
pollution levels and large connected components of areas marked as problematic (polluted) 
are of interest. Two of the same researchers suggest a hybrid method in 49| applied to the 
identification of regions of interest (hot spots) in raster maps. 
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3.3 Detection in Grey-Level Images 



The resuhs we developed for point clouds can be obtained (in slightly different form) for 
digitized images, which is the context for the experiments of Section [5l 

Suppose we observe a ci- dimensional pixel array Y, with a total of n pixels, of the form: 

Y = ^ 4raph„(/*) + CrZ, 

where Z is white Gaussian noise, with independent, standard normal entries; a > is the 
noise level; /i is the signal level; and for a subset S C [0, l]*^, C,s is the array with £^-norm 1 
identifying the pixels that S intersects, namely ^5(1) oc 1{S fl pixel(i) 7^ 0}. 
We observe Y and want to decide between Hq and Hi below: 

Hq: /i = 0; 

Hi : /i > 0, and /* G H^''^^^{a, (3) is unknown. 

This setting is considered in [gI] (with a slightly different definition for ^5) in the context of 
parametric objects. 

Following the same arguments as for point clouds, we find that the GLRT asymptotically 
separates i^o and Hi if fi > C(n'=/(2«'^) v r^-'=/(2°)) with C large enough, and that a detection 
threshold of same order of magnitude is achieved both by the approximate GLRT and the 
LSRT, with 6 = r]V n'^^'^. We omit details. 



4 Beamlet and Beamlet-Like Networks 

We now focus on curves {k = 1) of Holder smoothness with a G (1,2]. The corresponding 
net described in Section [2] is made of piecewise linear functions. In this section, we show 



that such a net may be obtained by carefully chaining beamlets as suggested in [23j, yielding 
a more economical net at a comparable degree of approximation. 

The case of curves is special in the sense that it is the simplest, and in particular allows 
us to use a parametrization by arclength. Higher dimensional surfaces, even though smooth, 
may exhibit strange behavior, for example a very thin 2D surface in 3D. 



4.1 Holder Curves 

We adopt here a slightly more intrinsic definition for curves. For a G (1,2] and A, k > 0, let 
T{a, A, k) be the set of curves 7 C [0, l]'' with length(7) < A and parametrized by arclength 
with 

\\^{t)-^{s)-{t-sW{s)\\<K\t-sr, Vs,t G [0,length(7)]. (9) 

r(a. A, k) is in close correspondence with H^''^{a,P) as defined in Section 2. Note that the 
case a = 2 includes all twice differentiable curves with curvature bounded by 2k,. 
Curves in T{a, A, n) satisfy the following properties. 

Lemma 4.1. For all 7 G r(a. A, k), 

\W{t)-i{s)\\ < 2K\t-s\''-\ Vs,t. 
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Proof. Fix < s < t < length(7). The triangle inequality and (Q give 

{t - s) h'{t) - i{s) II < ||7(t) - 7(^) - i{s) {t -s)\\ + Us) - 7(t) - i{t) {s - t) II 

< 2K{t-sY. 



□ 



Lemma 4.2. Let 7 G T{a, A, k). For all arclengths r < s < t, 

< 2K{t-rY. 



S — T 

7(s) - 7(r) - (7(t) - 7(r)) 



t — r 

Proof. Applying ([9]) twice yields 

||7(s) — 7(r) — (s — 'r)7'(r)|| < k(s — r)" < — rY 

and 



s — r 



t — r 



(7(t) -7('")) - (s -r)7'(r) 



< K (s - r){t - r)°"^ < - rf. 



Then apply the triangle inequality and conclude. 



□ 



4.2 Beamlets 



Beamlets were introduced in 2D by Donoho and Huo [23|, and then in 3D by Donoho and 
Levi 25|, l20||. We define them here in any dimension d > 2. Fix a maximum scale J G N. 
Define 60 = 2~"^ and at scale j G {0, . . . , J}, define A = 2~^ . For a given coordinate 
r = 1, . . . ,d, hyperplanes of the form 

{(xi, ...,Xd):Xr = hA}, 

where h G {0,...,A^^}, are called r-hyperplanes. Such hyperplanes will be called A- 
hyperplanes; they partition the unit hypercube [0, l]'^ into smaller hypercubes of sidelength 
A that we call A-hypercubes. On each A-hyperplane, we consider a regular square grid with 
spacing Sq. Formally, we consider gridpoints of the form {hiSo, . . . , hdSo), hr = 0,1, ■■■ , Sq^, 
with at least one coordinate an integer multiple of ASq^; if this happens at the rth coordi- 
nate, we speak of an r-gridpoint, which by definition belongs to an r-hyperplane. A beamlet 
is simply a line-segment joining two gridpoints belonging to the same A-hypercube. See 
Figures O 



The beamlet graph in [23| refers to the graph with nodes the gridpoints and edges the 
beamlets. Our interest instead is in building a beamlet good-continuation graph (i.e. net- 
work), with nodes the beamlets themselves and edges between beamlets in good-continuation, 
such that paths in that network provide a useful net for smooth curves. Such a graph is 
used in some experiments performed in 23|, yet never formally defined there or elsewhere, 
though a related construction is presented in [3] for curves that are graphs of Holder func- 
tions. Formally, two beamlets are neighbors if they are of the form [61,62] and [62,63] and 
satisfy 



||(6.,2-6.,i)(63 - 62) 



(6.,3-6.,2)(62-6i)|| 

<2^'-^(||63 - 62|| + ||62-6i||), Vr 



d. (10) 
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(a) Coarsest scale, j=0 



(b) Next finer scale, j=l 




(c) Coarsest scale, j=0 (d) Next finer scale, j=l 

Figure 5: Examples of beamlets in 2D and 3D. 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

Figure 6: Examples of 2D beamlets in good-continuation. 
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This is basically a constraint on the angle formed by [&i,&2] and [62,^3]; see Figure IH 
Let Mj J denote the resulting beamlet good-continuation network at scale j. 

Lemma 4.3. The number of beamlets at scale j is of order Mjj = 0{d?2'^^'^^^'>'^^^'^^'^'>^) and 
a given beamlet B G ^ of length \B\ has 0{d\B\-^'^-^'^) neighh 



•ors. 



Proof. The proof of Lemma 14.31 is in the Appendix. □ 

Remark. Short beamlets in the network B^j are highly connected. This is an undesirable 
feature and we were not able to avoid it, other than defining a closely related system in 
Section 14. 3[ 

To each beamlet B we associate a tubular region: 

R{B) = {x G [0, 1]"^ : min ||x - y|| < 2-'-^}. 

We extend this definition to subsets of beamlets R^ti) = UseTr 

Theorem 4.4. Fix a G (1,2] and A, k > 0. Let j(a,K) = ■ There is a universal 

constant K such that, when J is large enough and j > j{a,f3) + K, to each curve 7 G 
r(a, A, k) corresponds a path ttj in B^ j chaining at most {2d){X2^ + 2) beamlets such that 7 
is included in R{TTj). 

Proof. Theorem 14.41 is a consequence of Theorem 14.6} see the comments following Theorem 

mi □ 

Such a covering is illustrated in Figure [3 




Figure 7: Example of a curve (blue) approximation by a chain of 2D beamlets (red) in good 
continuation. 
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4.2.1 Multiscale beamlet good-continuation network 

The chaining algorithms proposed in [23] involve linking beamlets at different scales. Such 
a multiscale network may simply be constructed by taking the union of all Mj j, j = 0, . . . , J 
and adding edges between beamlets at different scale satisfying ( JTOl) with j replaced by the 
maximum of the two scales involved; see Figure [H As in Section I2.5.H an approximation 
is built using an RDP, this time of the ambient space [0, l]'^. Such a network provides 
more precise approximations to curves with varying smoothness, however at the cost of 
(substantially) increasing connectivity. 




Figure 8: Example of 2D beamlets at different scales in good continuation. 



4.3 Beams 

We found in Lemma l473l that defining a notion of good continuation directly between beamlets 
is problematic in that small corner beamlets become hubs, connected to a large number of 
other beamlets. The construction presented here uses other line-segments akin to beamlets, 
that we call beams. (Note that the term 'beam' refers to something else in [i^.) Beams at a 
given scale are of similar length, thus avoiding the problem just mentioned, and each beam 
can be well approximated by a short chain of beamlets at that same scale. 

Another drawback of beamlets as defined in Section 14.21 is that the quantization by 6q 
does not change with the scale as A does; this results in a system that is more rich than 
needed, therefore wasteful. Just as in Section [531 at scale j we define 6 = 2-'"'^, and A = 2'^ 
as before; we assume that j < J/2, so that A is an integer multiple of 6. 

We define two kinds of beams. For r = 1, . . . ,d, an r-beam is a line segment joining two 
r-gridpoints bi and 62 belonging to the same A-hypercube on opposite sides, and such that 

\\h-b2\\<A + 6. (11) 

Therefore, an r-beam makes an angle of about 45 degrees or less with the rth axis. 

For ri,r2 = 1, . . . ,d, with ri 7^ r2, an rir2-beam is a line segment joining an ri-gridpoint 
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bi and a r2-gridpoint 62 such that 

A < - 6,,,2| V \br,,l - br,,2\ < 2A, (12) 

I \bri,l - &ri,2| - l&ra.l " &r2,2| | < 5, (13) 
{\brul - bri,2\ V |6r-2,l " KiM) " l^r-g,! " KiM > S, Wr^ = 1, . . . ,d. (14) 

Note that bi and 62 do not belong to the same A-hypercube and that (HM is void in dimension 
d = 2. Hence, an rir2-beam connects an ri-hyperplane and a r2-hyperplane, making angles 
of about 45 degrees at the intersection with those hyperplanes. The reason rir2-beams are 
so restricted is that they are only used to connect ri-beams and r2-beams. 




(a) r2-beam [j ~ 0) (b) r I-beam (j = 1) (c) rlr2-beani [j — 2) 

Figure 9: Example of 2D beams. 



We define neighborhood relationships between beams as we did for beamlets in ( fTOl) : see 
Figure fTOl Specifically, two beams are neighbors if they are of the form [&i,62] and [62,^3] 
and satisfy 

\\{br,2 - br,l){b, - 62) - (&r,3 ' &r,2)(&2 " &l)|| 

< (115/20)(||63 - ^211 + \\b2 - bill), Vr = 1, . . . , rf. (15) 

For example, suppose [61, 62] and [62, ^s] are both ri-beams. If 63 is the intersection of the line 
(61, 62) with the ri-hyperplane 63 belongs to, then condition (|T5l) implies II63 — b^\\ < (5/2)5, 
so that 63 is among the 5^^"^ ri-gridpoints closest to 63. 

Let Mj J denote the resulting beam good-continuation network at scale j. 

Lemma 4.5. The number of nodes inM'^jj is of order 0{d2'^^'^^^^'^^^'^'^^'^^^ {l + d2'^^^-^)). More- 
over, all nodes have at most 2d ■ T^^^ neighbors, with most nodes having at most 2 ■ 5'^^^ 
neighbors. 

Proof. The proof of Lemma 14.51 is in the Appendix. □ 
Just as we did for beamlets, to each beam B we associate a tubular region: 

R{B) = {x G [0, l]"' : min ||x - y|| < 5}. 

yG-B 



20 



(c) Two /il/i2-beams 



Figure 10: Example of 2D beams in good continuation. 

Theorem 4.6. Fix a G (1, 2] and X, k, > 0. There is a universal constant K such that, when 

J is large enough and j > j{a, [3) + K , to each curve 7 G r(a, A, n) corresponds a path Tij{'j) 
(I 

in Mjj chaining at most X2^ + 2 beams such that 7 is included in R{TTj). 

Proof. The proof of Theorem 14.61 is in the Appendix. □ 




Figure 11: Example of a curve (blue) approximation by a chain of 2D beams (red) in good 
continuation. 

We now explain how Theorem 14.61 implies Theorem 14.41 First, any beam may be approx- 
imated within distance 60 by a chain of at most 2d beamlets at the same scale. Indeed, a 
beam touches at most 2d A-hyperplanes (at most d + 1 for a typical r-beam); on each one, 
select a (A, 5o)-gridpoint closest to the beam. By successively connecting those gridpoints 
with line-segments, a chain of beamlets is born. Therefore, a chain of beams of length at 
most X2^ + 2 may be approximated by a chain of beamlets of length at most {2d){X2^ + 2). 
That the successive beamlets in such a chain are in good continuation according to (ITOl) 
comes from the fact that the beams themselves are in good continuation according to (IT^ . 
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how beamlets are chained to approximate a beam, and the triangle inequahty, in parallel to 
how Claim 3 is established in the proof of Theorem 14. 6[ 

Remark. Both j and j provide e-nets for Holder curves embedded in the ci-dimensional 
unit hypercube, with e of order 2-'"'^. Which one is more economical? In terms of number of 
nodes, loggdC^'Jl) ~ 2dJ-{3d-l)j (Lemma[23D, while loggdlj- j|) ~ 2(rf- 1) J- (3ci-4)j 
(Lemma I4.5P : the latter is smaller for all relevant scales j < J/2. Perhaps more importantly, 
J is substantially more connected, with most nodes with 2 ■ 36"^ neighbors (2592 in 2D; 

93312 in 3D), compared to B^j, with most nodes having 2 ■ 5'^ neighbors (50 in 2D; 250 in 
3D). 



5 Numerical Experiments 

We perform some numerical experiments showing the power of approximation networks built 
on good-continuation principles. Specifically, we compare the filamentary content of simu- 
lated 3D datasets with a variety of beamlet-based algorithms of our own creation. Using 
software developed in 4^ and a basic notion of good-continuity as introduced in Section 



14.21 is enough to outperform simpler algorithms disregarding any spatial information, such 



as introduced in 20 



We consider three situations illustrated in Figure [T21 where each column corresponds to 
a different case, and for each case the goal is to distinguish between the top and bottom 
situations. In setting (a), we compare a random point cloud (top) with a set of random fila- 
ments of different lengths, orientations and curvatures (bottom). In setting (b), we compare 
a set of random filaments (top) with a set of random filaments constrained to pass through a 
small number of hubs (bottom). In setting (c), we compare a set of short random filaments 
(top) with a set of long random filaments (bottom), all filaments oriented in the direction of 
the first coordinate. 

Each dataset is a pixel array of size 64'^. The images are corrupted with a certain amount 
of additive white Gaussian noise (see Figure [T5]) . calibrated so the algorithm introduced in 
(iol (and described in Section 15.4.11) is powerless, i.e. essentially useless at distinguishing 
between the top and bottom settings. All filaments in these datasets are synthesized using 
trigonometric functions, each with randomly selected location, amplitude, frequency and 
phase shift. 

Though we introduce a variety of statistics, the workflow is the same: 

1. Compute the beamlet transform. This amounts to computing all the beamlet coeffi- 
cients, where a beamlet coefficient is defined as the line-integral of the image along the 



beamlet 23, 25 



Thresholding of the beamlet coefficients. Beamlets with a large coefficient are suspected 
to be intersecting with a filament, and therefore informative since our objective involves 
detecting filaments. We therefore focus on those beamlets with large coefficients by 



discarding those with low coefficients [23|,|25||. The choice of threshold is not determined 



in advance, rather a variety of thresholds are considered within some range. 
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(a) (b) (c) 

Figure 12: Simulated datasets: (a) random point cloud (top) vs. random filaments (bottom); 
(b) random filaments (top) vs. random filaments with hubs (bottom); (c) random short 
filaments vs. random long filaments. 






(a) Surface of the noisy data volume (b) Orthogonal slices through the noisy data 

Figure 13: Noisy version of the simulated data in the lower panel of column (a) in Figure 
[13 
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3. Construction of a good- continuation network (GCN) for each scale. Based only on the 
beamlets surviving thresholding, neighboring relationships as introduced in Section I4l2] 
are considered. 

4. Extraction of some relevant network statistics. We extract a number of statistics from 
the beamlet network that are sensitive to different degrees and kinds of filamentarity 
such as node and edge cardinality or connected components counts, centrality measures 
or even more sophisticated statistics based on path search in the network. 



Our software is based on the beamlet transform as implemented in 4^ and graph algorithms 
from the LEDA library [2I]. Our code is available online 



In what follows, we will say that a test or method is 'powerful' if it faithfully (more than 
95% of the time) distinguishes between the top and bottom settings. 



5.1 Number of Edges vs. Number of Nodes 

Our simplest algorithm looks at how the number of edges and the number of nodes vary 
together as functions of the threshold used. 

Consider column (a) of Figure [121 where we compare a random point cloud (top) with 
a set of filaments with random lengths, orientations and curvatures (bottom). In Figure 
[TH the number of edges is plotted as a function of the number of nodes for both top and 
bottom datasets. As expected, the curve corresponding to the random filaments rests above 
the curve corresponding to the random point cloud, since the GCN is more connected in 
the former setting. From our simulation studies, we found that this statistic is not powerful 
at per pixel signal-to-noise ratios (SNRs) below 0.8 for this specific situation. The dataset 
with filaments contains 20 of them with random lengths in the range [10, 64] (in number of 
pixels) . 




Figure 14: Number of edges vs. number of vertices for the random point cloud (red, solid 
curve) and the dataset of random filaments (blue, dashed curve) from column (a) of Figure 

m 
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BETWEENESS (C„) 



Figure 15: Number of vertices whose betweeness exceeds a given threshold for the dataset of 
random filaments without hubs (red, solid) and the dataset of random filaments with hubs 
(blue, dashed) from column (b) of Figure [121 



5.2 Vertex Betweeness 



Centrality attributes are well-known from network analysis |47[ and used for example in 



social networks studies ^||. The betweeness of vertex v is defined as 

Pst{v) 



Csiv) = J2 



-L -Li Pst 

where s, t run through all vertices except f , pst is the number of longest paths from s to t, 
and Pst{v) the number of longest paths from s to t that pass through a vertex v. (Note that 
shortest paths are sometimes used instead.) 

Consider column (b) in Figure [121 where the top image contains randomly distributed 
filaments and the bottom image also contains random filaments but with hubs. The number 
of filaments and their individual characteristics are the same in both cases. In Figure [TB] we 
plot the number of vertices with betweeness exceeding a given threshold for both datasets. 
As expected, the curve corresponding to the random filaments with hubs rests above the 
curve corresponding to the random filaments without hubs, since in the former case the hubs 
translate into vertices in the GCN with large betweeness. From our simulation studies, we 
also found this statistic not powerful at SNRs below 0.8 for the type of filaments chosen 
here. The data set without hubs (top) contains 20 filaments with horizontal orientations and 
random lengths in range [60,63]. The data set with hubs contains 5 groups of 4 filaments 
with horizontal orientations and length 64, then all 4 filaments in the group have common 
filamentarity region of length 3 (hub). 

Now, consider a slightly different setting with just one large hub, illustrated in the left 
column of Figure [T^l In the right column of the Figure [THl we compare the maximal betwee- 
ness of both datasets for different noise levels, which is significantly larger for the dataset 
with a hub. Again, we found that the maximal betweeness statistic is not effective when the 
voxel-level SNR is less than 0.8. The data set without hub (top) contains 18 filaments with 
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horizontal orientations and random lengths in range [48, 64]. The data set with hub contains 
18 filaments with horizontal orientations and length 64, then all filaments in the group have 
common filamentarity region of length 5 (hub). 




Figure 16: Maximal betweeness as a function of SNR, for the dataset of random filaments 
without hub (red, sohd) and the dataset of random filaments with a hub (blue, dashed) from 
the left column. 



5.3 Filamentarity Survival Index 

For each beamlet v in the network, let oj{v) denote its coefficient (or weight), i.e. the line- 
integral of the image along this beamlet. The weight of a beamlet path p is then defined as 
the sum of the weights of the beamlets it passes through: 

^(^^) = ^^('")- 

Let P be the partitioning of G into disjoint paths defined recursively as follows: 

pi = argmaXp6G^^(p); 
Pi = argmaXpgG\{pi,...,p,_i}t^(p), ^ > 1- 

We define the Filamentarity Survival Index D{t) as the fraction of paths in P with weight 
greater than t. The computation of the piecewise constant function D{t) is iterative, where 
at each iteration the Longest Weighted Path (LWP) is found and removed from the network. 
The algorithm terminates when the network is empty. 

We next define the Filamentarity Survival Ratio (FSR) which compares the filamentary 
survival index of a given dataset / with the filamentary survival index a random point cloud 
with same energy, denoted / 
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where e is a small positive number, used to avoid divide-by-zero situations. 

Consider column (c) in Figure [121 where we compare a dataset of random short filaments 
(top) with a dataset of random long filaments (bottom), all filaments oriented in the di- 
rection of the first coordinate. Figure [IT] presents the FSR curves for these two datasets. 
Additionally, this figure contains the FSR curve for the case of purely random points, as in 
the top image in column (a) in Figure [121 As expected, the FSR curve for the purely random 
dataset (black, dotted line) remains close to 1. For the dataset containing long filaments, 
however, the FSR (blue, dashed line) is significantly higher than 1 for certain values of t. 
For the dataset containing short filaments, the FSR curve (red, solid line) is found between 
the other two curves. We found that the FSR statistic is powerless at distinguishing between 
short and long filaments (in this setting) for SNRs below 1 in this specific situation. The 
data set with short filaments (top) contains 30 filaments with horizontal orientations and 
lengths 20. The data set with long filaments (bottom) contains 10 filaments with horizontal 
orientations and lengths 60. 




Figure 17: Filamentarity Survival Ratios for the dataset of random short filaments (red, 
solid) and for the dataset of random long filaments (blue, dashed) in column (c), and for the 
random point cloud (black, dotted) in column (a) of Figure [121 



5.4 Discussion 

5.4.1 Comparison with statistics based on beamlet coefficients only 

The numerical experiments we presented provide evidence that using the spatial relationship 
between beamlets allows for the design of algorithms that have the ability to perform well 
even in the case of very low SNR, where statistics based only on the beamlets coefficients fail. 
More precisely, we compared our different algorithms described above with the Log-Survival 
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Index (LSI) introduced in [20|; see Tabled! The LSI is defined as 

log(l + iV,(t)) 



iog(i + iv,-; 



where Nj (t) is the number of beamlet coefficients at the j-th scale that exceed t, and Nj is 
the total number of beamlet coefficients at scale j. 



Case 


(a) 


(b) 


(c) 


Statistic 


Graph connectivity 


Vertex betweeness 


FSR 


Lowest SNR 


0.8 (1.33) 


0.8 (2) 


1 (1.33) 



Table 1: Summary of the experiments. The numbers (resp. numbers in parentheses) in the 
last row correspond to the lowest SNR at which the statistic used for that particular case 
(resp. the LSI statistic) is still powerful. 



Figure [18] shows the behavior of this statistic for situation (a) of Figure [121 The LSI 
curves are clearly disjoint at SNR = 4; however, they merge at SNR = 0.8, resulting in the 
LSI being powerless, while the statistic presented in Section 5.1 is still powerful. 
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Figure 18: LSI curves at various scales for the random point cloud (red, solid) and for the 
dataset with random filaments (blue, dashed) in column (a) of Figure [T21 



5.4.2 Computational Complexity 



The computational burden comes from the beamlet transform. In these experiments, we 
used the version developed in 4^, which runs in order 0{n^) flops for an pixel array. The 
version based on the Fast Slant Stack 25|, [lOj is in theory faster, O(n^logn) flops, yet in 



practice the implementation in ^] is more precise and faster on smaller arrays as considered 
here. 
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We mention that beamlets have only been developed for 2D and 3D datasets, partly 
because the method suffers from the curse of dimensionality, since the number of beamlets 
increases exponentially with the ambient dimension; see Lemma 14.31 Note also that the im- 
plementation in [23,] runs in 0{n'^ log n) for an n"^ pixel array, so that the 3D implementations 
are comparatively heavier. 



5.4.3 Computation of the Longest Weighted Paths 

in the experiments described above we 
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Since finding the LWP is an NP-Hard problem 
restricted ourselves to acyclic networks. We assume that the foreground signal consists of a 
set of curves of the form 7(5) = (s, 7j^(s), 7z(s)). Therefore, it is possible to approximate such 
curves with beamlets oriented along the fist axis only, so their chaining will never induce a 
cycle. This rather artificial assumption was made only for computational reasons; for general 



datasets one would be forced to use approximations or other strategies [26|, |23[ . We tried 
statistics based on connected components, which are computationally tractable, but were 
not able to improve on the LSL 



5.4.4 Full multiscale analysis 

We built a good-continuation network for each scale separately along the lines presented 
in Section 14. 2[ Based on Theorem 14.61 this is fine if we expect filaments of homogenous 
smoothness. If we want to test for filaments of varying smoothness, building a single good- 
continuation network that includes neighboring relationships across scales may be more 
useful. 



Appendix 

Proof of Lemma 12.41 

We first count the number of nodes (m, h). There are A"'^ choices for m and 2/5(5|~|^ for each 
h^^\ so for h a total of 

JJ 2P5-^I = (2/3) ^3 5-^3 A'^^ 
NI<N 

For a given node (m, h), m has at most 2k neighbors m^'s in the square grid. And for 
each one of them, there are at most 6^^^ h^'s satisfying the left part of (^^, since at each s 
there are at most 6 choices. □ 



Proof of Theorem 12.51 

In view of Lemma l2.2[ it is enough to show that, for m, G {!,..., A"^}'^ with = m+ej 



t<la\-\s\ 



< 3. 
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By the triangle inequality, we have 



*<L"J-|s| 



< 



Xr 



E 

t<H-|s| 



t<[aj-|sl 



By definition of h^^\m., /), the first term on the right handside is bounded by 1/2 while 
the third term is bounded by 1/2 J2t ^A' — 6xp(l)/2 < 3/2. So we are left with showing 
that the second term is bounded by 1. To do that, we use Lemma 12.31 and the fact that 



A to get 



< ci/?A 



a-|s| 



Since A* = SgS^^^ for all s,t, we further get 



X-1 /■(s+te,)| 



Xy 



This concludes the proof of Theorem 12.51 



□ 



Proof of Theorem 13.21 

We use simplified notation for clarity. By Boole's Inequality, we have 

P {Mr > C6'^-''n\Ho} < \V\ ■ maxP {N{Rs{P)) > C6'^'''n\Ho} . 

For the number of paths, 

since there are 0{5~^'^'^''^^^ A'^'^^'^^^^) choices for the starting point and 6'-'^"'^'''^^ at each step 
after that, along a path of length A~^ (see Lemma [2. 6p . Hence, log(|P|) < ai5~^/" for some 
constant ai not depending on S. 

Under Hq, N{Rs{P)) is binomial with parameters n and \Rs{P)\d, so that 

maxP {N{Rs{P)) > C6'^-''n\Ho} = P |Bin(n, max |i?5(P)|rf) > C6'^-''n 
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By integrating with respect to z (the last d — k coordinates) first, we have: 

\Rs{in, hi, ... , h,_fc)|, = (Co + l)A'^5'^-^ V(m, hi, ... , h..^). (18) 
Hence, for aU P G P, |i?5(P)U = (co + l)5'^"^ and therefore, since 6 = 

maxP {N{RsiP)) > C6'^'''n\Ho} = P {Bin(n, (cq + 1)6'^-'') > C6'^-''n} . 

By standard large deviation bounds like Bernstein's Inequality [52|, the logarithm of the 
right hand side is bounded from above by —a2C6'^~^n for C > 2(co + 1), where 02 does not 
depend on 5 or n. 

Collecting terms, we get the following bound: 

logP {Mp > C6'^-''n\Ho} < a^b'^l'^ - Ca^^b'^-^n, VC > 2(co + 1). 

By choosing C = 2((co + 1) V {ai/a2)), the right hand side is negative if 5 > n^. □ 

Proof of Theorem 13.31 

We use simplified notation for clarity. By ( fTSi) . for all (m, hi, . . . , h^-k) ^ G''''^^''{a, /?), we 
have: 

P {S{m, hi, ... , hd^k) = MHq} = P {Bin(r2, ai5'=/"+^-'=) > } , 

where ai = (cq + l)(ci/3)~'^/". Let qoir) = maxP {Bin(n, aip) > rnp} over n G N and 
p G (0, 1) such that np > 1; note that qo{,T) — as r — >• cxo. In the same way that we 
obtained (fT7|l . we find that the number of paths of the form 

{(m,,(t), hi(t), . . . , ha-k{t)) :t = to,...,to + e-l} 

is bounded above by a26~'^^~''^'^'-^ A^'^''^'^^'^ , for 02 not depending on /i or rj. With 

this fact and Boole's Inequality, we get: 

P {L > i\Ho} < a25-('^-'=)^3A(^"'=)"4-'=6('^-^)^^^ ■ g^. 

Hence, with go small enough (i.e. r large enough), we have 

P{L>log(l/5)|/7o}^0, n^oo, 

where we used (|6]). We then conclude with the fact that S = 72-"/(^+°('^-'=)) v 7^. 

Assume for concreteness that rj > 0. Under Hi, let P* E V such that graph^(/*) C 
Rs{P*). Note that |graph^(/*)|d = rj''-'' and |graph^(/*) n i?5(m, hi, . . . , hd-fc)U = AV""^ 
for all (m, hi, ... , hd-k) G P*- Therefore, together with the behavior under the null, for all 
(m, hi, . . . , hd-k) G P*, we have: 

P {Sim, hi, ... , hd-k) = l\Hi}>P {Bm{n, £„A^) > rn5'^/"+'^-'= } . 
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Hence, with e„ > CnS''' ^ we have, for all (m, hi, ... , h.d-k) G P*, 

P {S(m, hi, ... , hd-k) = l\Hi} > P {Bin(n, C(ci/?)-'=/"n5'=/"+'^-'=) > . 

With T fixed as above, let qi{b) = minP {Bin(n, bp) > rnp} over n G N and p G (0, 1) such 
that np > 1; note that qi{b) — >■ 1 as 6 — >■ oo. Assume that n is large enough that > 1, so 
that n(5'=/"+'^-^ > 1. Then, for C = 6(ci/3)'=/", 

P {5(m, hi, ... , hd-k) = l\H,} > gi(6), V(m, hi, ... , h^.^) G P*. 

Now, by the Erdos-Renyi Law (in fact a modified version allowing for weak dependencies, 
found in Appendix A. 3 in [s*]), the longest significant run along P* has length at least 
log(|P*|)/ log(l/gi)(l + o(l)) with high probability. Hence, for qi large enough (i.e. C large 
enough) , 

P{L > log(l/5)|/7i} ^ 1, n^oo, 
where we used |P*| = A"'^' together with ([6]). □ 



Proof of Lemma 14.31 

There are 0{A~'^) A-hypercubes and each one of them has 0{ASq^)^~^ gridpoints on anyone 
of its 2d faces. Therefore, there are 0{A-'^d^{A5Q^f'-'^-^^) = 0{d^2'^^'^-^^^-^'^-^^^) beamlets 
at scale j. 

We now look at the degree of a beamlet B = [bi, 62] G Let r be such that |&r-,2— &r,i | = 
11^2 — ^1 II- Consider another beamlet of the form [62)^3]; and define 

03 = 02 + 7 ^(02 - Ol)- 

Or, 2 — Or,l 

If [61,^2] and [62,^3] are neighbors, 

II63-63II < 2-^1"^^-^^"" + "^^-^^"" 



I^r2,2 ~ br2,l\ 

< 2^-'{A/6o + l) 
= 0{2-^\B\-^) 

where the first inequality comes from (fT5|) and the second from the fact that for any beamlet 
[b, b'], 6 < ||6' — 6|| < A. Now, in a ball of radius A there are at most 0{A/SoY~^ r-gridpoints, 
for any r = 1, . . . , d. Therefore, B has 0{d\B\^^'^~^^) neighbors. □ 



Proof of Lemma 14.51 

The number of r-gridpoints is of order 0(A^^5^^'^^^)). For a fixed r-gridpoint 61, there are 
order 0{A6~^Y~^ r-gridpoints 62 such that [61,^2] forms an r-beam. Therefore, the number 
of r-beams is of order 0(A'^~^(5~^'^+^). 
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The number of rir2-beams is of smaller order of magnitude, because they are more 
constrained. Indeed, 6^1,2 and 6^2,2 are determined by 6^1,1 and 6^2,1 up to 0{6), which 
corresponds to order 0(1) choices; while for each ^^,2,^3 7^ ri,r2 there are order 0{A6~^) 
choices. Hence, there are order 0{A'^~^6~'^'^~^^) rir2-beams. 

All together, the number of beams is of order 0((iA'^~^5"^'^+^(l + dA^^6)). 

We now bound the degree of a vertex in B^j. By construction, an ri-beam is connected 
only with ri-beams, and (possibly) rir2-beams; similarly, an rir2-beam is connected only 
with ri" and r2-beams, and (possibly) r^ri- and r2r3-beams. 

Fix an rir2-beam [61,62]- First, take an r2-beam [62563] and define 

63 = 62 + 7 — (62 - 61), 

which is the intersection of the line (61,62) with the r2-hyperplane 63 belongs to. If [61,62] 
and [62,63] are neighbors, 

1163-6^11 < (115/20)^^-^^1^^ 

\Or2,2 — Or2,l\ 

< (lW/20)(4 + O(5)) 

< (5/2)5, for 5 small enough, 

where the first inequality comes from f[T^ and the second from f[TT]) - f[T^ . Therefore, there 
are at most 5 choices per coordinate of 63 (except for 6^3,3). Hence, [61,62] has at most 5'^~^ 
neighbors that are r2-beams. Similarly, [61, 62] has at most 5'^~^ neighbors that are ri-beams. 
Next, take an r2r3-beam [62,63] and define 

61 = 62 + 7 — (62 - 63), 

which is the intersection of the line (62,63) with the ri-hyperplane 61 belongs to. If [61,62] 
and [62,63] are neighbors, performing the corresponding computations we arrive at 

||6i - 6*11 < (115/20)(6 + 0{6)) < (7/2)5, for 6 small enough. 

Note that [62, bl] is proportional to [62, 63], with a constant of proportionality between 1 and 
2, and [62,63] satisfies (fT2l) - (fT4l) with ri replaced by r^; this together with the bound above 
(applied with = r^) and the triangle inequality implies: 



^r,,l - 6r.,,2| > |6r-o,l - 6r,,2| - 65. 



Turning things around, define 

63 = 62 + 7 7 ip2 - 61), 

"r3,2 ~ 0^3,1 

which is the intersection of the line (61, 62) with the r3-hyperplane 63 belongs to. Again using 
f[T5|) together with properties f|TT|) - f|T4|) . and the above inequahty, we get 

II63 - 6*11 < (115/20)(6 + 0(5)) < (7/2)5, for 5 small enough. 
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Therefore, there are at most 7 choices per coordinate of 63 (except for fers.s)- Hence, [&i,&2] 
has at most T''"^ neighbors that are r2r3-beams. Similarly, [61, 62] has at most T^^^ neighbors 
that are rari-beams. 

The reasoning is similar when [bi, 62] is an ri-beam. In particular, an r-beam that does not 
make an angle close a 45" with the r-hyperplanes it connects only has r-beams as neighbors, 
at most 5'^^^ on each side. □ 



Proof of Theorem 14.61 

Assume the conditions Theorem 14.61 are satisfied, with the constant K chosen as to make all 
the forthcoming appearances of 0{k,A°') sufficiently small compared to 6. Note that under 
these conditions, both 6, A and 6/ A are all decreasing functions of J. We choose a smooth 
parametrization by arc-length of 7. Let i = length(7). 

We say that . . . , Ud) G M.'^ is an r- vector if\ur\ > |MrJ, Vri. Without loss of generality, 
assume 7'(0) is an r- vector with 7^(0) > 0. We may also assume that 7(0) belongs to an r- 
hyperplane, for otherwise we work with an extension of 7 that reaches an r-hyperplane in the 
direction — 7'(0). We recursively define an increasing sequence of arclengths {si : < i < I}. 
First, let sq = 0. Suppose Si-i has been defined and assume, without loss of generality, that 
7'(si_i) is an r-vector; then, let 

Si = mi{s > Sj_i : 7(3) belongs to an r-hyperplane and |7r(s) — 7r.(sj_i)| > A}, 

with the usual convention inf{0} = 00. We may also assume that Sj < 00, for otherwise we 
work with an extension of 7 that reaches an r-hyperplane (in the case above) in the direction 
■y'{i). If Si = i, then let J = i and stop the recursion. 

Define bi to be the gridpoint closest to 7(si), and Bi = the line-segment joining 

bi and bi+i. 

• Claim 1. / < AA 1 + 2. 

• Claim 2. For i = 0, . . . , / — 1, -Bj is a vertex in j. 

— d 

• Claim 3. For i = 0, . . . , I ~ 2, Bi and -Bj+i are neighbors in B^- j. 

• Claim 4. For i = 0, ...,/- 1, 7([si, Sj+i]) C R{Bi). 

Preliminaries. 

• For A small enough, 

A<Si+i-Si<3d^/^A, Vi = 0, ...,/- 1. (19) 

Proof. We start with the lower bound. A Taylor expansion and the fact that 7 is 
parametrized by arc gives 

||7(si+i) - 7(-Si)|| < Si+i - Si, 
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and by construction, ||7(si+i) — 7(si)|| > A. 

We now turn to proving the upper bound. Suppose without loss of generahty that 
7'(sj) is an r-vector. Then, by construction |7r(s) — 7r(si)| < 2A,Vs G [si,Sj+i], and 
l7'.(sj)| > This, (|9]) and the triangle inequality imply 

2A + k{s - Si)" - - Si) > 0, Vs G [si, Si+i]. 

And for A < (S^rf^/^K)^/^"-^), the left handside is negative for s — Si replaced by Sd^^^A, 
so that Si+i - Si < 3(i^/^A. □ 



For 2 = 1, ...,/, if 7'(sj) is an ri-vector (say) and 7'(sj_i) is a r2- vector (say), then 

WrM)\ - brM)\ < ' S.-l)"-^ (20) 

Proof. Because |7r2(sj-i)| > \l'r-^{si^i)\, we have 

|7:,(^.)I - |7;(^.)I < \lrM)\ - \l'rM-l)\ - iWrM)\ " ^^(^.-l)!). 



Using Lemma I4.H this implies 



□ 



For i = 1, . . . , I, 



\\h-^{s,)\\<S/2. (21) 
Proof. By construction. □ 



Proof of Claim 1. A straightforward consequence of the fact that Sj+i — Sj > A for all 
i = 0, ...,/ — 1. The '+2' comes from possibly extending the curve as described above. 

□ 



Proof of Claim 2. We first prove that Bq is a beam. Assume without loss of generality that 
7(so) is on an ri-hyperplane and 7'(so) is a ri-vector; we now show that Bq is an ri-beam. 
Applying (j9]) and the fact that 17^2 (^o) I < bril'^o)! we get 

\lr2{Sl) - lr2{So)\ < l7ri(5o)|(si - Sq) + k(si - Sq)"- 

Using ([9]) again, together with |7ri(si) — 7ri(so)| = A, we have 

l7n(so)|(si-So) < A + k(si-So)". 

Therefore, 

I7r2(si) - 7r2(so)| < A + 2k{si - So)". 
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On the right handside, we use (fT9l) to get k{si — sq)" = 0(kA"). All together, 

l7r.(si)-7r.(so)| < A + 0(/€A"); 

similarly 

|7,3(si)-7r3(so)| < A + 0(/€A"). 
This, together with the triangle inequality and (pTI) . shows that 

\\bi -bo\\<6 + A + 0(/tA") < 2(5 + A, 

which implies that || 61 — 60 II < (5 + A since ||6i — 60 1| is an integer multiple of 6. This proves 
that Bq is an ri-beam. 

We next consider z = 1, ...,/ — 1 and prove that Bi is a beam. Assume without loss 
of generality that 7'(sj) is an ri-vector. If 7'(sj_i) is an ri-vector, then 7(sj) is on an ri- 
hyperplane and the situation is as above. Therefore, assume without loss of generality that 
7'(sj_i) is a r2-vector; we now show that Bi is an rir2-beam. With ([9]), we get 

I |7ri(Si+l)-7ri(Si)|-|7r2(Si+l)-7r2(Si)| | < {\-fl^{Si)\ - \-fl^{Si)\){Si+i - Si) + 2K{Si+i - Si)"] 

together with (12(1 . this implies 

I |7ri(Si+l) - 7ri(Si)| - |7r2(Si+l) " lr2{Si)\ \ < 6^(5^+1 - Si)", 

and with ([19]), 

I l7n(Si+l) - lri{Si)\ - |7r2(Si+l) - 7r2(Si)| | = 0(kA"). 

On the other hand, using IQ and the fact that |7ri(si)| > 17^3 (■Sj)|, we also get 

hn{si+i) - ln{si) \ - |7r3(si+i) - lrs{si)\ > -2/t(si+i - s^)", 
which by (HM implies 

hn{si+i) -7n(si)| - |7r3(si+i) - -frsisi)] > -0(fi:A°). 
Using the equations above together with the triangle inequality and (|2T1) . 

I \bri,i+l ~ bri,i\ - \br2,i+l " ^r2,i| | < ^ + 0(kA") < 2S, 
|6ri,i+l - &r-i,i| - \br3,i+l - &r3,i| > S - 0(/«A") > -26. 

Since the differences above are multiple integers of 6, the first inequality may be replaced by 
< 6 and the second by > —5. Therefore Bi is an rir2-beam. □ 

Proof of Claim 3. Fix coordinate ri; we want to show that: 

\\{bri,z+l - &n,i)(^i+2 - bi+i) - (&ri,i+2 " ) " &i)|| 

< {116/20){\\bi+2-bi+2\\ + \\bi+i-bi\\). 
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We first prove a similar inequality involving 7: 

||(7ri(Si+l) - 7r.i(Si)) (7(5^+2) - l{Si+l)) - (7ri(Si+2) " 7ri (^i+l)) (7(Si+l) - l{Si))\\ 

< 0(/cA")(||7(s,+2) -7(^m)ll + Il7(^m) -7(^.)ll). 
This simply comes from applying ([9]) to get 

7ri(Si+2) - 7ri(Si+l) = 7ri (Si+l) (Si+2 " Sj+l) + 0(/t(Si+2 - ^i+l)"), 
7ri(Si+i) - 7r-i(Si) = 7'.^(Si+i)(Si+i - Si) + 0(/«(Sj+i - Si)"), 
l{Si+2) - 7(-5i+l) = 'j'{Si+i){Si+2 - Si+l) + 0(/«(Sj+2 - Si+l)"), 

7(si+i) - 7(si) = 7'(si+i)(si+i - Si) + 0(/t(si+i - Si)°), 

and then using the triangle inequality and (fT9|) . 

Using this inequality, the triangle inequality and fl2T]) . we then get 

||(&ri,i+l - bri,i){bi+2 - h+l) - {bn,i+2 " ^n.i+l) " &i)ll 

< ((5/2 + 0(/€A"))(||6,+2 - + ll&m - + 25) 

< {5/2 + 0(/tA"))(l + A-i5)(||6,+2 - &i+2|| + ll&m - &.||), 

where we used the fact that any beam has (as a vector) sup norm at least A. We conclude by 
making 0(/tA") sufficiently small compared with 5 and 5 itself sufficiently small compared 
with A. 

Therefore, Bi = [6i,6i+i] and Bi^i = [64+1,64+2] are neighbors in B^- j. □ 

Proof of Claim 4- Applying Lemma IT2] with r = Si and t = Si+i, it follows that 7([si, Sj+i]) 
belongs to the 2K(si+i — Si)"-neighborhood of [7(si), 7(si+i)]. Because of ( HM . fl?I]) and the 
triangle inequality, this implies that 7([si, Sj+i]) C R{Bi). □ 
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